{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wdFlpUvUHM2u"
   },
   "source": [
    "# Finding a similar sentence with Sentence-BERT \n",
    "In this section, let us explore how to find a similar sentence using Sentence-BERT. Suppose, you have an e-commerce website and say in our master dictionary we have many order-related questions like - How to cancel my order? Do you provide a refund? and so on. Now, when a new question comes in, our goal is to find the most related question in our master dictionary to the new question. Let us see how we can do this using Sentence-BERT. \n",
    "\n",
    "First, let us import the necessary libraries: \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true,
    "id": "K3e7s4VPHTWw"
   },
   "outputs": [],
   "source": [
    "%%capture\n",
    "!pip install sentence_transformers==0.4.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true,
    "id": "nxetho4aHM26"
   },
   "outputs": [],
   "source": [
    "from sentence_transformers import SentenceTransformer, util\n",
    "import numpy as np "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ISmMAOiCHM27"
   },
   "source": [
    "\n",
    "Download and load the pre-trained Sentence-BERT: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true,
    "id": "F68PROTAHM27"
   },
   "outputs": [],
   "source": [
    "model = SentenceTransformer('bert-base-nli-mean-tokens')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9wnGM3J8HM27"
   },
   "source": [
    "\n",
    "Define the master dictionary: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true,
    "id": "sqOqfuVAHM28"
   },
   "outputs": [],
   "source": [
    "master_dict = [\n",
    "                'How to cancel my order?',\n",
    "                'Please let me know about the cancellation policy?',\n",
    "                'Do you provide refund?',\n",
    "                'what is the estimated delivery date of the product?', \n",
    "                'why my order is missing?',\n",
    "                'how do i report the delivery of the incorrect items?'\n",
    "              ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "p3Cx3laWHM28"
   },
   "source": [
    "\n",
    "Define the input question:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true,
    "id": "k1msQUYlHM28"
   },
   "outputs": [],
   "source": [
    "inp_question = 'When is my product getting delivered?'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dQ-JuzltHM29"
   },
   "source": [
    "\n",
    "Compute the input question representation: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true,
    "id": "78MTQfzFHM29"
   },
   "outputs": [],
   "source": [
    "inp_question_representation = model.encode(inp_question, convert_to_tensor=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "yl_rjm46HM29"
   },
   "source": [
    "\n",
    "Compute the representation of all the questions in the master dictionary: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true,
    "id": "k8rJE_KaHM2-"
   },
   "outputs": [],
   "source": [
    "master_dict_representation = model.encode(master_dict, convert_to_tensor=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "y8dxUQkUHM2-"
   },
   "source": [
    "\n",
    "Now, compute the cosine similarity between the input question representation and representation of all the question in the master dictionary: \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true,
    "id": "dCJMSYxqHM2-"
   },
   "outputs": [],
   "source": [
    "similarity = util.pytorch_cos_sim(inp_question_representation, master_dict_representation )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5R8tIIn9HM2-"
   },
   "source": [
    "Print the most similar question:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "xcCbNih8HM2_",
    "outputId": "55571b03-3afc-469e-c7a1-f2d86fcd1afc"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The most similar question in the master dictionary to given input question is: what is the estimated delivery date of the product?\n"
     ]
    }
   ],
   "source": [
    "print('The most similar question in the master dictionary to given input question is:',master_dict[np.argmax(similarity)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fhEvbGhdHM2_"
   },
   "source": [
    "\n",
    "In this way, we can use the pre-trained Sentence-BERT for many interesting use cases. We can also finetune it for any downstream tasks. We learned how Sentence-BERT works and how to use it for computing sentence representation. Apart from applying Sentence-BERT for the English language, can we also apply it for other languages? Yes and that's exactly what we learn in the next section. \n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "name": "8.07. Finding a similar sentence with Sentence-BERT .ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
